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Abstract. We intend to develop part of the theoretical tools needed for the detection 
of gravitational waves coming from the capture of a compact object, 1-100 M Q , by a 
Supermassive Black Hole, up to a 10 9 M Q , located at the centre of most galaxies. The 
analysis of the accretion activity unveils the star population around the galactic nuclei, 
and tests the physics of black holes and general relativity. The captured small mass 
is considered a probe of the gravitational field of the massive body, allowing a precise 
measurement of the particle motion up to the final absorption. The knowledge of the 
gravitational signal, strongly affected by the self-force - the orbital displacement due to 
the captured mass and the emitted radiation - is imperative for a successful detection. 
The results include a strategy for wave equations with a singular source term for all 
type of orbits. We are now tackling the evolution problem, first for radial fall in Regge- 
Wheeler gauge, and later for generic orbits in the harmonic or de Donder gauge for 
Schwarzschild-Droste black holes. In the Extreme Mass Ratio Inspiral, the determina- 
tion of the orbital evolution demands that the motion of the small mass be continuously 
corrected by the self-force, i.e. the self-consistent evolution. At each of the integra- 
tion steps, the self-force must be computed over an adequate number of modes; further, 
a differential-integral system of general relativistic equations is to be solved and the 
outputs regularised for suppressing divergences. Finally, for the provision of the com- 
putational power, parallelisation is under examination. 



1. How motion of a particle is affected by its own mass and the emitted radiation 

A particle, of z a coordinates, follows the geodesic given by 

^-^nw-o (i) 

dr ~ dr + l » vU U " U ' Uj 

where r, b T^ v , u a = dz a jdr are the proper time, Christoffel symbol and four- velocity 
in the background (b) metric g^y, respectively. Let us now consider the same particle 
moving in a perturbed metric. 
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In the restricted two-body problem, iBlanchet et al.1 (12011 ). the particle infinitesi- 
mal si ze implies that the perturbations diverge at the particle itself. iDetweiler & Whiting 
(120031) adapted Dime's a pproach to the self-forc e equation - the MiSaTaQuWa equation 
from iMino et all dl997h : lOuinn & Waldl dl997l) . In flat spacetime, the radiative Green 
function is obtained by subtracting the singular contribution, half-advanced plus half- 
retarded, from the retarded Green function. The singular part does not exert any force 
on the particle, upon which only the regular field acts. In curved spacetime, the at- 
tainment of the radiative Green function passes through the inclusion of an additional, 
purposely built, function H. This approach emphasises that the motion is a geodesic of 
the full (f) metric g^ = g^v + h^v where h R v is the radiative part of the perturbations, and 
it implies two notable features: the regularity of the radiative field and the avoidance of 
any non-causal behaviour. The radiative R component is conceptually given by 



R = Ret - Sing - Ret - -[Ret + Adv - H] = -[Ret - Adv + H] , (2) 

where the ad hoc function H is defined to agree with the advanced Green function when 
the particle is in the future of the evaluation point (H = Adv); and to the retarded Green 
function when the particle is in the past of the evaluation point (H = Ret), but differs 
from zero in the intermediate values of the world-line outside the light-cone. Thus, the 
radiative component includes the state of motion at all times prior to the advanced time 
and it is not a representation of the physical field, but rather of an effective field. Indeed, 
H goes to zero when the evaluation point coincides with the particle position. 

We define z a = z a + Az°" as the coordinates of the particle in the full metric. The 
geodesic is given by 

*=* + ^ = 0, (3) 

where A, f T« v , u a = dz a /dA are the proper time, Christoffel symbol and four-velocity 
in the full metric, respectively. We wish to compute the difference between the two 
geodesies, knowing that the final equation of motion of the particle in the perturbed 
background is given by a tota i = D 2 z a /dr 2 + D 2 Az a /dr 2 . Obviously, the gauge freedom 
allows to choose a comoving coordinate frame where no acceleration occurs. After 
some considerable manipulation, we get 

= -V"^« v -^ + u a i/)(2h R ^. v -h R ^u v . (4) 

s v j 

Background geodesic deviation vis i t ^ ti? 

° ° i elj —acceleration MisalaQuWa 

Stemmed from geodesic principles, an exact geodesic deviation equation at first order 
is obtained by subtracting the background from the perturbed motion, equation (0]). 
The first right-hand side term depends on the background metric, while the second 
depends upon the perturbations gener ated by the part i cle m ass m, and it is the non- 



trivial MiSaTaQuWa self-acceleration. iGralla & Waldl (|2008l) adduce that a first order 



perturbation scheme will let grow away from the exact solution at late times, and that 
no different destiny will occur to a second or higher order scheme at even later times. 
They assert that it is preferable i) to drop searching higher order self-force expressions; 
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ii) to evolve the trajectory by continuously and iteratively applying the correction given 
by the second term, while disregarding the first term. 

The self-force is defined in the harmonic or de Donder (dD) gauge, where the 
ten metric components aren't combined into a wave equation , as in the Regge-Wheeler 
(RW) gauge. But, computation in other gauges. iBarack & OrilfcOOll) . it is often not pos- 



sible, as the variation due to the change from dD to a new gauge (G), SF^^ ^ does 
not admit a well defined value. One exception is constituted by the radial trajectory, 
where the two self-forces (dD and RW gauges) can be made equal. The regularisation 
process subtracts the diverging or singular part (represented by the regularisation pa- 
rameters A", B a , C a ' , D a , which are gauge independent, and to be computed in the dD 
gauge) from the full perturbations, following 



(5) 



where L - i + 1, € indicating the mode, and ± represents the two sides at the particle 
coordinate. For the non-adiabatic radial fall (radial coordinate r and particle position in 
the background, r p ), in RW gauge and in coordinate tim e, the expression corresponding 
to the self-force is given by ISpallicci & Aoudial (120041) 



A 



2 = ( A ±2 - A l L ~ B° ~ C a IL) - D° , 



(6) 



f=0 



where A%,B a ,C a ,D a are derived from equation [5] and the corresponding untilded reg- 
ularisation parameters, and 
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being M the black hole mass, r p the particle velocity, H { x , perturbations (of C° continu- 
ity class) drawn by the gauge-invariant Moncrief wave function ip. The latter is derived 
from the Regge-Wheeler-Zerilli wave equation (V e potential, r* tortoise coordinate) 



«/(r, t) - F e (r)d(r - r p (tj) + G c (r)jS(r - r p (t)) . (8) 

The perturbations, and thereby A2, depend upon m the mass of the particle-star. 
The back-action shows as a correction Ar p , that is r p {t) = r p (t) + Ar p (f), and it obeys to 
a f-ODE, corresponding to equation @] 

Ar p = AoC^y, r p , r p )Ar p + Aj (g^, r p , r p )Ar p + A 2 (h pv , r p , r p ) . (9) 

The iterative approach, figure [T] demands an accurate reinterpretation of equation 
[9] Firstly, for an infinitesimal time step, Ao and Aj vanish. Secondly, the A2 parameter 
is to be computed on the new trajectory: indeed, Ar p represents here the difference with 



+ V c (r) 

dt 2 + dr *2 ^> 



4 



A. Spallicci et al. 



i> l (r p {t),t) 



„new _ „old 



Ar„ 



Hi 

J 




Figure 1. Iteration scheme for 

computation of the evolving orbit. Figure 2. Fourth order scheme. 



the trajectory computed at the previous integration step, and not anymore the back- 
ground trajectory at start. Thirdly, each single iterated position and velocity may be 
identified with the coordinates of a particle possessing the same values and moving 
on a - to be determined - geodesic. This approach sums up the effects computed on 
successive osculating orbits, i.e. stretches of geodesies. 



2. The algorithm 

Classical finite difference methods have to be adapted to deal with the discontinuity of 
the wave function \p and its derivatives on the trajectory r p (t) due to the infinitesimal 
size of the particle. Analytically derived jump conditio ns on \jt and derivatives ar e 



used as guideline and reference throughout the integration, lAoudia & Spalliccil (120111) : 
bitter et all (1201 lh . Fourth order accuracy on i// has been reached to compute the metric 
perturbations and their first derivatives (thereby implying third order derivatives of ifr) 



n+m<4 



+ 0(h 5 ) , (10) 



VQ nmt \a = lim Q nmt - lim Q nmt , (11) 

r-*r^(fo-) r^r p (t tT ) 
if r*(ti) < r* p {t,) : q t = , else q t = q % , (12) 

for i = [B,C...J], Q nme {r,t) = d n n &f\\j\r,f), Tf' m) are Taylor coefficients and q t are 
constants depending on the way the particle crosses the cells, Figure |2] 

3. Parallel computing 

Parallelisation allows better performance, in terms of resolution and processing time, 
and it is an evident aid for the computation of orbital evolution. At this preliminary 
stage though, only the non-iterative code has been worked upon. The availability of par- 
allel hardware doesn't imply an immediate exploitation of its capacity, as a simulation 
program often needs refurbishment. The original sequential algorithm was improved 
by using loop unrolling and cache optimisation. The modified version runs seven times 
faster, and it is used as standard reference. The following parallel techniques have 
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been investigated and tested on a machine equipped with two quad-core AMD Opteron 
running at 2.3GHz. 

SSE instructions. The SSE (Streaming SIMD - Single Instruction, Multiple Data - 
Extension) technology works with double-precision floating-point instructions applied 
onto a single arithmetical operation simultaneously, thus doubling the computational 
efficiency. However, it requires to explicitly deal with the operations between the main 
memory and the processor SSE registers, while taking care of the memory alignment 
constraints for efficiency. This implies the redesign and rewriting of the algorithms for 
those instructions. On one core, the SSE implementation achieves a 1.6 speedup over 
the reference implementation. A speedup of 2 wasn't achieved, since the bus between 
the main memory and the processor was left unaltered, and it was unable to feed the 
SSE registers quickly enough to reach peak performance. 

SSE instructions + Multi-Threading. The exploitation of multiple processors 
or cores in a shared-memory computer, requires setting up threading mechanisms to 
assign the workload. In our case, this is rather straightforward as the elements of the 
domain can be computed separately. However, a linear speedup wasn't achieved, since 
threads need to be synchronised at the end of each main loop iteration. Indeed, speedup 
doesn't scale linearly with the number of processors. Using eight processors, we get 
a speedup of 4 over the reference implementation and of 2.5 over the mono-core SSE 
implementation. 

CUDA. GPUs (Graphic Processing Unit) are massive multi-core processors (more 
than 1500 cores in the latest cards) integrated into a single chip. CUDA (Compute 
Unified Device Architecture) is a practical architecture for general-purpose computing 
on Nvidia GPUs. Porting our algorithm to CUDA, it requires to specify how to split 
the work over the cores. Frequent synchronisations are limiting, due to the very large 
number of cores. We also have to manage the data movements between the main and 
the GPU memories. The CUDA implementation is currently in progress, and thereby 
not yet fully optimised. With a Nvidia GTX680 card with 1536 cores, the preliminary 
implementation achieves a speedup of 5.6 over the reference implementation. However, 
there is still room for considerable optimisation. 

4. Conclusions 

We have developed some theoretical and computing tools for studying bodies motion 
under self-force, for a specific case. Generalisation to other non-adiabatic orbits are 
under consideration. Details are given in published and upcoming references. 
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